The ARCOMEM Architecture for Social- and Semantic-Driven Web Archiving
نویسندگان
چکیده
The constantly growing amount of Web content and the success of the Social Web lead to increasing needs for Web archiving. These needs go beyond the pure preservation of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events. Due to the size of the Web, the traditional “collect-all” strategy is in many cases not the best method to build Web archives. In this paper, we present the ARCOMEM (From Future Internet 2014, 6 689 Collect-All Archives to Community Memories) architecture and implementation that uses semantic information, such as entities, topics and events, complemented with information from the Social Web to guide a novel Web crawler. The resulting archives are automatically enriched with semantic meta-information to ease the access and allow retrieval based on conditions that involve high-level concepts.
منابع مشابه
Exploiting the Social and Semantic Web for Guided Web Archiving
The constantly growing amount of Web content and the success of the Social Web lead to increasing needs for Web archiving. These needs go beyond the pure preservation of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions, and other events. In this paper we present the ARCOMEM archite...
متن کاملARCOMEM Crawling Architecture
The World Wide Web is the largest information repository available today. However, this information is very volatile and Web archiving is essential to preserve it for the future. Existing approaches to Web archiving are based on simple definitions of the scope of Web pages to crawl and are limited to basic interactions with Web servers. The aim of the ARCOMEM project is to overcome these limita...
متن کاملExploiting Multimedia in Creating and Analysing Multimedia Web Archives
The data contained on the web and the social web are inherently multimedia and consist of a mixture of textual, visual and audio modalities. Community memories embodied on the web and social web contain a rich mixture of data from these modalities. In many ways, the web is the greatest resource ever created by human-kind. However, due to the dynamic and distributed nature of the web, its conten...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملUsing Events for Content Appraisal and Selection in Web Archives
With the rapidly growing volume of resources on the Web, Web archiving becomes an important challenge. In addition, the notion of community memories extends traditional Web archives with related data from a variety of sources on the Social Web. Community memories take an entity-centric view to organise Web content according to the events and the entities related to them, such as persons, organi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Future Internet
دوره 6 شماره
صفحات -
تاریخ انتشار 2014